Consider the data below:

Let’s estimate a GAM:

m1 <- bam(y ~ s(time, k = 20), data = curves)
summary(m1)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## y ~ s(time, k = 20)
## 
## Parametric coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.1795848  0.0004607   389.8   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##           edf Ref.df    F p-value    
## s(time) 15.52   17.6 6060  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.915   Deviance explained = 91.5%
## fREML = -16489  Scale est. = 0.0021116  n = 9950
plot_smooth(m1, view = "time", col = "slateblue4", print.summary = FALSE, rug = FALSE)

Now, imagine that the same data set was not a collection of time series. For example instead of time we could have price of houses (in millions of euros), and y could be some measure of demand. The plot would be like this:

Question: How would you change the GAM above to reflect the fact that there are no curves but it is all one data set?

AR1 residuals

gam.check(m1)

## 
## Method: fREML   Optimizer: perf newton
## full convergence after 9 iterations.
## Gradient range [-6.31743e-06,6.479967e-06]
## (score -16488.76 & scale 0.002111591).
## Hessian positive definite, eigenvalue range [8.040793,4974.011].
## Model rank =  20 / 20 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##           k'  edf k-index p-value
## s(time) 19.0 15.5    1.03    0.98

We note a strong evidence for heteroscedasticity. Where does it come from?

Look at the variation in the large peak. When y is high we are around the peak, and that is also where the largest variation across curves occurs, hence the wider residual span for higher fitted values.

Now let’s consider one particular curve, say one whose first peak is higher than the mean. As we see from the raw curves, we expect that if the residual error for this particular curve at say time = 0.4 is positive, it will also be positive in the next time samples. Hence neighbouring residuals are correlated, which violates one of the hypotheses of the model. We would like to express this fact in the GAMM.

A way to alleviate the problem above is to induce a structure in the residuals as follows: \[ \epsilon_i = \rho \cdot \epsilon_{i-1} + \psi_i \] which means that the residual at point \(i\) is proportional to the residual at point \(i-1\), i.e. one step earlier on the time axis, plus another unknown term \(\psi_i\), the latter values distributed as \(N(0, \sigma^2)\) and independent. The GAMM will not estimate the parameter \(\rho\) for us directly, rather we meed to set it ourselves. This type of sub-model for the noise is well known in signal processing and it’s called AR1, i.e. auto regressive model of order 1 (because it only depends on one step in the past).

What we typically do is this:

  1. Estimate a GAMM without AR1 term (like above)
  2. Compute autocorrelation of residuals
  3. Re-run the GAMM setting \(\rho\) equal to the autocorrelation value at lag 1

The following estimates \(\rho\) and plots the complete autocorrelation function for the residuals:

m1.acf <- acf_resid(m1)

This function estimates the correlation of residuals with themselves in the past at different lags. We take the value as lag 1 as our estimate for \(\rho\)

rho <- m1.acf[2]
# at index 1 we have lag 0, whose value is 1 by definition
# at index 2 we have lag 1, which is what we need
rho
##         1 
## 0.8045908

Now we re-run the GAMM. We set \(\rho\) by specifying the argument rho. We also need to indicate where the start of curves in the data are, otherwise the AR1 model will be also applied between the last and the first sample of curves (argument AR.start).

m1.ar1 <- bam(y ~ s(time, k = 20), data = curves,
              rho = rho, AR.start = curves$time == 0
              )
summary(m1.ar1)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## y ~ s(time, k = 20)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.179576   0.001376   130.5   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##           edf Ref.df     F p-value    
## s(time) 12.43  15.13 804.6  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.915   Deviance explained = 91.5%
## fREML = -21706  Scale est. = 0.0020926  n = 9950
plot_smooth(m1.ar1, view = "time", col = "slateblue4", rug = FALSE, print.summary = FALSE)
acf_resid(m1.ar1)
gam.check(m1.ar1)
## 
## Method: fREML   Optimizer: perf newton
## full convergence after 7 iterations.
## Gradient range [-1.51787e-05,1.483733e-05]
## (score -21705.86 & scale 0.002092649).
## Hessian positive definite, eigenvalue range [6.899551,4974.007].
## Model rank =  20 / 20 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##           k'  edf k-index p-value
## s(time) 19.0 12.4       1    0.41

The results show that:

Curve-specific random factors

The way to explicitly represent the information about curves, i.e. which sample belongs to which curve, is to introduce a random smooth factor at the curve level:

m1.randCurves <- bam(y ~ s(time, k = 20) + 
                       + s(time, curveId, bs = "fs", m=1, k = 15)
                       , nthreads = 2
                       , data = curves)
summary(m1.randCurves)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## y ~ s(time, k = 20) + +s(time, curveId, bs = "fs", m = 1, k = 15)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.179574   0.004535    39.6   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                    edf Ref.df      F p-value    
## s(time)          17.69   18.6 671.54  <2e-16 ***
## s(time,curveId) 505.63  749.0  56.77  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.984   Deviance explained = 98.5%
## fREML = -24077  Scale est. = 0.00039985  n = 9950
plot_smooth(m1.randCurves, view = "time", col = "slateblue4", rug = FALSE, print.summary = FALSE)
op <- par(mfrow=c(2,2))
gam.check(m1.randCurves)
## 
## Method: fREML   Optimizer: perf newton
## full convergence after 7 iterations.
## Gradient range [-1.198123e-07,1.173967e-07]
## (score -24076.94 & scale 0.0003998499).
## Hessian positive definite, eigenvalue range [8.57444,4985.08].
## Model rank =  770 / 770 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##                    k'   edf k-index p-value
## s(time)          19.0  17.7       1    0.43
## s(time,curveId) 750.0 505.6       1    0.42
acf_resid(m1.randCurves)
par(op)

The results show that:

As alternative to the mgcv library, you can try the sparseFLMM library, which incorporates curve-level random smooths by default and it performs an efficient estimation based on functional PCA. sparseFLMM is (in my opinion) generally less user-friendly than mgcv.

AR1 residuals and latent factors

This is a curve dataset with a 4-level factor F4. An AR1 residual with coefficient \(\rho =\) 0.9 is added to the 4 expected curves.

Let us first model it as: y ~ F4 + s(t, by = F4), i.e. a regular GAM with one factor and no AR1 residual.

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## y ~ F4 + s(t, by = F4, k = 10)
## 
## Parametric coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.503876   0.001413 356.636  < 2e-16 ***
## F4A.2        0.482954   0.001998 241.709  < 2e-16 ***
## F4B.1       -0.011297   0.001998  -5.654 1.68e-08 ***
## F4B.2        0.455562   0.001998 228.000  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##              edf Ref.df     F p-value    
## s(t):F4A.1 8.638  8.962 12669  <2e-16 ***
## s(t):F4A.2 8.816  8.990 14264  <2e-16 ***
## s(t):F4B.1 8.551  8.942 12218  <2e-16 ***
## s(t):F4B.2 8.889  8.996  6972  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.992   Deviance explained = 99.2%
## fREML = -6733.5  Scale est. = 0.0020361  n = 4080

## 
## Method: fREML   Optimizer: perf newton
## full convergence after 8 iterations.
## Gradient range [-2.764712e-06,2.764141e-06]
## (score -6733.482 & scale 0.002036088).
## Hessian positive definite, eigenvalue range [3.777704,2036.029].
## Model rank =  40 / 40 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##              k'  edf k-index p-value
## s(t):F4A.1 9.00 8.64    1.07       1
## s(t):F4A.2 9.00 8.82    1.07       1
## s(t):F4B.1 9.00 8.55    1.07       1
## s(t):F4B.2 9.00 8.89    1.07       1

Notice that:

Let us add the AR1 term to the model, taking the estimated value \(\hat{\rho} =\) 0.8851126:

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## y ~ F4 + s(t, by = F4, k = 10)
## 
## Parametric coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.503812   0.004994 100.876   <2e-16 ***
## F4A.2        0.483003   0.007063  68.382   <2e-16 ***
## F4B.1       -0.011101   0.007063  -1.572    0.116    
## F4B.2        0.455781   0.007063  64.527   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##              edf Ref.df    F p-value    
## s(t):F4A.1 8.654  8.974 1630  <2e-16 ***
## s(t):F4A.2 8.819  8.993 1889  <2e-16 ***
## s(t):F4B.1 8.561  8.958 1586  <2e-16 ***
## s(t):F4B.2 8.895  8.998 1494  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.992   Deviance explained = 99.2%
## fREML = -10010  Scale est. = 0.0018469  n = 4080

## 
## Method: fREML   Optimizer: perf newton
## full convergence after 6 iterations.
## Gradient range [-5.816972e-05,5.753168e-05]
## (score -10009.53 & scale 0.001846912).
## Hessian positive definite, eigenvalue range [3.784681,2036.029].
## Model rank =  40 / 40 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##              k'  edf k-index p-value
## s(t):F4A.1 9.00 8.65    1.07       1
## s(t):F4A.2 9.00 8.82    1.07       1
## s(t):F4B.1 9.00 8.56    1.07       1
## s(t):F4B.2 9.00 8.89    1.07       1

Notice that:

Let us now change two things:

  1. Pretend we do not know there are 4 levels, but only two levels A and B
  2. Remove the AR1 noise and introduce uncorrelated noise

Let us model this dataset as: y ~ F2 + s(t, by = F2)

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## y ~ F2 + s(t, by = F2, k = 10)
## 
## Parametric coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.745345   0.006508 114.524   <2e-16 ***
## F2B         -0.018165   0.009204  -1.974   0.0485 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##            edf Ref.df     F p-value    
## s(t):F2A 5.541  6.686 752.7  <2e-16 ***
## s(t):F2B 6.539  7.675 435.9  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.673   Deviance explained = 67.4%
## fREML = 822.95  Scale est. = 0.086408  n = 4080

## 
## Method: fREML   Optimizer: perf newton
## full convergence after 8 iterations.
## Gradient range [-5.533282e-07,4.192454e-07]
## (score 822.9545 & scale 0.08640774).
## Hessian positive definite, eigenvalue range [2.154157,2038.006].
## Model rank =  20 / 20 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##            k'  edf k-index p-value    
## s(t):F2A 9.00 5.54     0.1  <2e-16 ***
## s(t):F2B 9.00 6.54     0.1  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Notice that:

Let us blindly apply the AR1 recipe, i.e. take the estimated \(\hat{\rho} =\) 0.9911937 and re-fit the model:

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## y ~ F2 + s(t, by = F2, k = 10)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.74514    0.03829  19.458   <2e-16 ***
## F2B         -0.01803    0.05416  -0.333    0.739    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##            edf Ref.df     F p-value    
## s(t):F2A 8.194  8.870 193.5  <2e-16 ***
## s(t):F2B 8.610  8.969 254.0  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.672   Deviance explained = 67.4%
## fREML = -7751.7  Scale est. = 0.067817  n = 4080

## 
## Method: fREML   Optimizer: perf newton
## full convergence after 8 iterations.
## Gradient range [-2.848765e-09,2.66256e-09]
## (score -7751.663 & scale 0.06781707).
## Hessian positive definite, eigenvalue range [3.560744,2038.013].
## Model rank =  20 / 20 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##            k'  edf k-index p-value    
## s(t):F2A 9.00 8.19     0.1  <2e-16 ***
## s(t):F2B 9.00 8.61     0.1  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Notice that:

Let us instead add a curve-specific random smooth term (and remove the AR1 term):

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## y ~ F2 + s(t, by = F2, k = 10) + s(t, curveId, by = F2, bs = "fs", 
##     m = 1)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.74534    0.03882  19.202   <2e-16 ***
## F2B         -0.01816    0.05356  -0.339    0.735    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                      edf  Ref.df       F p-value    
## s(t):F2A           8.023   8.139   73.92  <2e-16 ***
## s(t):F2B           8.592   8.645  112.93  <2e-16 ***
## s(t,curveId):F2A 348.851 359.000 1201.31  <2e-16 ***
## s(t,curveId):F2B 347.848 359.000 1184.22  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.998   Deviance explained = 99.9%
## fREML = -8195.9  Scale est. = 0.00040811  n = 4080

## 
## Method: fREML   Optimizer: perf newton
## full convergence after 16 iterations.
## Gradient range [-1.997467e-05,1.62132e-05]
## (score -8195.885 & scale 0.0004081082).
## Hessian positive definite, eigenvalue range [3.384527,2063.451].
## Model rank =  1460 / 1460 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##                      k'    edf k-index p-value
## s(t):F2A           9.00   8.02    0.99    0.32
## s(t):F2B           9.00   8.59    0.99    0.33
## s(t,curveId):F2A 720.00 348.85    0.99    0.34
## s(t,curveId):F2B 720.00 347.85    0.99    0.30

Notice that:

Take home message